Intro to Data Viz

Jim Rose

Lecture for IBS519, Fall 2023

Who am I?

  • One of your TAs for this course

  • PhD student (5th year, Genetics & Molecular Biology program)

  • Computational immunology/gene regulation research

  • Self-taught coder (R & python)

My website (build one for yourself!):

Main Q’s for Today:

  1. Why look at (visualize) data?
  1. How should I look at data?
  1. What the heck is ggplot2? And how do I use it?

Part 1: Why visualize data?

  • To explore or understand data better
  • To convey a point (or make an argument)

Visualizing to explore data

Sometimes visualzing data can tell you more than summary statistics…

Visualizing to convince or make a point

When data visualization goes wrong

Part 2: How should I visualize data?

“With great power comes great responsibility” – Uncle Ben Parker

How should I visualize data?

Pillar 1: Show the data (honestly)

Pillar 2: Think about perception

  • You need to think about the WAY your audience’s BRAIN will PERCEIVE your chart

  • Perceptual accuracy is influenced by the way you are conveying information.

(i.e. type of chart used)

Think about your plot in terms of “channels”

Pillar 3: Reduce Clutter

Code
library(tidyverse)

poke <- read_csv("./DataSets/pokemon.csv")

ggplot(poke, aes(x=Attack, y=Defense)) + 
  geom_point() +
  geom_text(aes(label=Name))

Pillar 3: Reduce Clutter

Code
library(tidyverse)
library(ggrepel)

poke <- read_csv("./DataSets/pokemon.csv")

ggplot(data=poke, aes(x=Attack, y=Defense)) + 
  geom_point(alpha=0.8, color='darkgrey') + geom_text_repel(aes(label=ifelse(Defense>125 | Attack >140, as.character(Name), ""))) + 
  geom_density2d()

Pillar 4: Use color strategically

To indicate groups or categories

Pillar 4: Use color strategically

To highlight

DON’T MAKE RAINBOW CLOWN VOMIT CHARTS

Consider inclusivity when choosing a color palette

Original

Deuteranope

Tritanope

Pillar 5: Get creative

W.E.B. Du Bois

Paris Exhibition 1900

Consult a chart library or atlas for ideas on chart types

https://r-graph-gallery.com/

Other great resources

Data Visualization with R by Rob Kabacoff, PhD

Better Data Visualizations by Jonathan Schwabish

Data Visualization: A Practical Introduction by Kieran Healy